Skip to content

feat: Add database backup and restore utilities#26607

Open
pmbrull wants to merge 12 commits intomainfrom
task/add-database-backup-and-restore-utilitie-3e452359
Open

feat: Add database backup and restore utilities#26607
pmbrull wants to merge 12 commits intomainfrom
task/add-database-backup-and-restore-utilitie-3e452359

Conversation

@pmbrull
Copy link
Collaborator

@pmbrull pmbrull commented Mar 19, 2026

Summary

  • Adds DatabaseBackupRestore utility class for full database backup and restore via JDBI
  • Adds backup and restore CLI commands to OpenMetadataOperations
  • Adds MigrationTestRunner with MigrationTestCase interface for migration validation
  • Adds test-migration CLI command that restores a backup, runs pending migrations one by one, and executes before/after test assertions
  • Supports both MySQL and PostgreSQL using information_schema for dynamic table/column discovery
  • Automatically skips GENERATED columns during backup (entity tables store derived columns from JSON)
  • Backup creates a .tar.gz archive with metadata.json (timestamp, version, db type, per-table info) and one JSON file per table
  • Restore validates database type match, supports --force flag to truncate existing data, and handles FK constraints
  • Migration test summary prints to stdout for easy PR attachment

Usage

# Backup
./openmetadata-ops.sh backup -c conf/openmetadata.yaml --backup-path /path/to/backup.tar.gz

# Restore (fails if tables have data)
./openmetadata-ops.sh restore -c conf/openmetadata.yaml --backup-path /path/to/backup.tar.gz

# Restore with force (truncates existing tables first)
./openmetadata-ops.sh restore -c conf/openmetadata.yaml --backup-path /path/to/backup.tar.gz --force

# Test migrations against a backup
./openmetadata-ops.sh test-migration -c conf/openmetadata.yaml --backup-path /path/to/backup.tar.gz

Migration Test Framework

Define test classes per migration version following the naming convention:

package org.openmetadata.service.migration.tests.v1120; // for version 1.12.0

public class MigrationTest implements MigrationTestCase {
  @Override
  public List<TestResult> validateBefore(Handle handle) {
    // assertions before migration runs
  }
  @Override
  public List<TestResult> validateAfter(Handle handle) {
    // assertions after migration runs
  }
}

The test-migration command outputs a summary table:

============================================================
              Migration Test Summary
============================================================
 Source version  : 1.11.0
 Target version  : 1.12.0
 Database type   : MYSQL
 Backup timestamp: 2026-03-19T14:30:00Z
------------------------------------------------------------
 Migration | Test                         | Phase  | Result
------------------------------------------------------------
 1.11.1    | (no tests)                   |   -    |   -
 1.11.2    | column X exists              | BEFORE |  PASS
 1.12.0    | config migrated              | AFTER  |  FAIL
           |   Expected 5 rows, got 3     |        |
------------------------------------------------------------
 Total: 1 passed, 1 failed
============================================================

Test plan

  • Unit tests for extractDatabaseName (4 tests)
  • Unit tests for versionToPackage (5 tests)
  • Manual test: backup against a local MySQL database, verify archive structure
  • Manual test: restore to an empty database, verify data integrity
  • Manual test: restore without --force on non-empty database fails with table listing
  • Manual test: restore with --force truncates and restores successfully
  • Manual test: test-migration with a sample MigrationTest class

pmbrull and others added 3 commits March 19, 2026 18:35
Introduces a new utility class that provides database backup and restore
functionality for OpenMetadata's CLI. It discovers all tables via
information_schema, filters out GENERATED columns, dumps everything to
a .tar.gz archive with metadata, and restores from such archives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 19, 2026 17:42
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 19, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an ops-level database backup/restore capability to OpenMetadata, wiring new backup / restore CLI commands into OpenMetadataOperations and implementing archive-based export/import via JDBI for MySQL/PostgreSQL.

Changes:

  • Introduces DatabaseBackupRestore utility to write/read a .tar.gz backup with per-table JSON plus metadata.json.
  • Adds backup and restore Picocli commands to OpenMetadataOperations.
  • Adds unit tests for extractDatabaseName JDBC URL parsing.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java Implements table/column discovery, backup archive creation, and restore logic.
openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java Wires backup / restore commands into the ops CLI.
openmetadata-service/src/test/java/org/openmetadata/service/util/DatabaseBackupRestoreTest.java Adds unit tests for JDBC database-name extraction helper.
openmetadata-service/pom.xml Adds commons-compress dependency for tar/gzip support.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +242 to +250
});

try {
jdbi.useHandle(this::disableForeignKeyChecks);
restoreTablesFromArchive(backupPath, tablesMetadata);
LOG.info("Restore completed successfully");
} finally {
jdbi.useHandle(this::enableForeignKeyChecks);
}
Comment on lines +294 to +296
byte[] content = tais.readAllBytes();
ArrayNode rows = (ArrayNode) MAPPER.readTree(content);

Comment on lines +153 to +160
int offset = 0;
ArrayNode allRows = MAPPER.createArrayNode();

while (true) {
String sql =
String.format(
"SELECT %s FROM %s LIMIT %d OFFSET %d",
quotedColumns, quotedTable, BATCH_SIZE, offset);
Comment on lines +360 to +368
JsonNode val = row.get(col);
if (val == null || val.isNull()) {
batch.bindNull(col, Types.VARCHAR);
} else if (val.isNumber()) {
if (val.isLong() || val.isInt() || val.isBigInteger()) {
batch.bind(col, val.longValue());
} else {
batch.bind(col, val.doubleValue());
}
Comment on lines +159 to +160
"SELECT %s FROM %s LIMIT %d OFFSET %d",
quotedColumns, quotedTable, BATCH_SIZE, offset);
Comment on lines +215 to +251
public void restore(String backupPath, boolean force) throws IOException {
LOG.info("Starting database restore from {}", backupPath);

ObjectNode metadata = readMetadata(backupPath);
String backupDbType = metadata.get("databaseType").asText();
if (!backupDbType.equals(connectionType.name())) {
throw new IllegalStateException(
String.format(
"Backup database type '%s' does not match current connection type '%s'",
backupDbType, connectionType.name()));
}

LOG.info(
"Backup info - version: {}, timestamp: {}, databaseType: {}",
metadata.get("version").asText(),
metadata.get("timestamp").asText(),
backupDbType);

ObjectNode tablesMetadata = (ObjectNode) metadata.get("tables");

jdbi.useHandle(
handle -> {
if (force) {
truncateAllTables(handle, tablesMetadata);
} else {
validateTablesEmpty(handle, tablesMetadata);
}
});

try {
jdbi.useHandle(this::disableForeignKeyChecks);
restoreTablesFromArchive(backupPath, tablesMetadata);
LOG.info("Restore completed successfully");
} finally {
jdbi.useHandle(this::enableForeignKeyChecks);
}
}
"SELECT column_name FROM information_schema.columns "
+ "WHERE table_schema = 'public' AND table_name = :table "
+ "AND (is_generated = 'NEVER' OR is_generated IS NULL) "
+ "AND (column_default NOT LIKE 'nextval%' OR column_default IS NULL) "
batch.bind(col, val.doubleValue());
}
} else if (val.isBoolean()) {
batch.bind(col, val.booleanValue());
Comment on lines +381 to +387
private void disableForeignKeyChecks(Handle handle) {
if (connectionType == ConnectionType.MYSQL) {
handle.execute("SET FOREIGN_KEY_CHECKS = 0");
} else {
handle.execute("SET session_replication_role = 'replica'");
}
}
pmbrull and others added 2 commits March 19, 2026 19:11
…mary output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a test-migration command to OpenMetadataOperations that:
- Restores a backup, then runs pending migrations one by one
- For each migration with a test class, runs validateBefore/validateAfter
- Prints a summary table to stdout for PR validation
- Returns exit code 0 (all pass) or 1 (any fail)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🟡 Playwright Results — all passed (17 flaky)

✅ 3387 passed · ❌ 0 failed · 🟡 17 flaky · ⏭️ 183 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 452 0 3 2
✅ Shard 2 305 0 0 1
🟡 Shard 3 667 0 6 33
🟡 Shard 4 683 0 2 41
🟡 Shard 5 670 0 2 73
🟡 Shard 6 610 0 4 33
🟡 17 flaky test(s) (passed on retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/DataQuality/TestCaseImportExportBasic.spec.ts › Data Steward can successfully export test cases (shard 3, 1 retry)
  • Features/DataQuality/TestCaseIncidentPermissions.spec.ts › User with only VIEW cannot PATCH incidents (shard 3, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 3, 1 retry)
  • Features/ImpactAnalysis.spec.ts › Verify Upstream connections (shard 3, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Set & Update table-cp, hyperlink-cp, string, integer, markdown, number, duration, email, enum, sqlQuery, timestamp, entityReference, entityReferenceList, timeInterval, time-cp, date-cp, dateTime-cp Custom Property (shard 5, 1 retry)
  • Pages/ExplorePageRightPanel.spec.ts › Should allow Data Consumer to edit tags for table (shard 5, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can get all the roles hierarchy and edit roles (shard 6, 2 retries)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Table (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Topic (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

pmbrull and others added 2 commits March 19, 2026 21:02
- Use single Handle for entire restore (FK disable/enable + inserts on same session)
- Add binary data handling (val.isBinary() check) in restore to decode Base64
- Add ORDER BY via primary key discovery to prevent non-deterministic pagination
- Escape embedded quotes in quoteIdentifier to prevent SQL injection
- Validate archive table names against metadata to reject unknown tables
- Stream backup to temp file per table to avoid OOM on large tables
- Stream restore via JsonParser to avoid loading entire table into memory
- Use REPEATABLE READ transaction isolation for consistent backup snapshots
- Switch to positional parameters (?) to handle columns with special chars
- Replace hardcoded 'public' schema with current_schema() for PostgreSQL
- Use bind(idx, (Object) null) instead of bindNull with Types.VARCHAR
- Fix unused pattern variable (use 'number' binding in instanceof chain)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The val.isBinary() check in insertRowsStreaming was dead code because
Jackson's writeBinaryField encodes byte[] as base64 strings in JSON,
and readTree deserializes them as TextNode (not BinaryNode). This caused
binary data to be inserted as raw base64 strings instead of decoded bytes.

Store binary column names (discovered via information_schema.columns) in
the backup metadata so restore can decode base64 back to byte[] for the
correct columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pmbrull and others added 5 commits March 19, 2026 21:13
…tion, extract commons-compress version

- MigrationTestRunner: replace all System.out.println/printf with LOG.info
  in printSummary and centerText for proper logging
- MigrationTestRunner: add LOG.warn for partial migration state on failure
- OpenMetadataOperations: add --force flag to test-migration command to
  prevent accidental table truncation (mirrors restore command pattern)
- pom.xml: extract hardcoded commons-compress 1.27.1 to parent POM property

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…adata, and versionToPackage edge cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…flow

- Handle BigDecimal in backup to prevent truncation of DECIMAL/numeric values
- Add MAX_METADATA_SIZE limit and use readNBytes for defensive metadata parsing
- Move discoverBinaryColumns before data write for logical grouping
- Eliminate double FK disable/enable by restructuring force-restore flow
- Add DatasourceConfig.initialize to backup/restore CLI commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --batch-size option to backup, restore, and test-migration commands.
Defaults to 1000 rows. Applies to both SQL pagination during backup and
insert batching during restore.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d AFTER tests on failure

- Add SAFE_IDENTIFIER regex to quoteIdentifier() for SQL injection defense-in-depth
- Use underscore separators in versionToPackage to prevent version collisions (e.g. 1.12.0 vs 1.1.20)
- Skip validateAfter when migration has failed to avoid confusing results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 20, 2026 08:23
@gitar-bot
Copy link

gitar-bot bot commented Mar 20, 2026

Code Review ✅ Approved 5 resolved / 5 findings

Adds comprehensive database backup and restore utilities with CLI commands for migration validation and testing. Resolves critical issues including FK check isolation, SQL injection via backup metadata, snapshot isolation during reads, version collision handling, and test execution safety.

✅ 5 resolved
Bug: FK checks disabled on separate handle, not effective for inserts

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:244-250 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:303
At line 245, jdbi.useHandle(this::disableForeignKeyChecks) disables FK checks on a handle that is immediately closed. Then restoreTablesFromArchive (line 246) calls jdbi.useHandle(handle -> insertRows(...)) at line 303, which obtains a different connection from the pool. SET FOREIGN_KEY_CHECKS = 0 (MySQL) and SET session_replication_role = 'replica' (Postgres) are session-level settings — they only apply to the connection they were executed on. Since each jdbi.useHandle() may obtain a different pooled connection, the inserts will run with FK checks still enabled, causing failures when tables are restored in an order that violates foreign key constraints.

The same issue affects the enableForeignKeyChecks call in the finally block (line 249) — it may re-enable checks on yet another connection that never had them disabled.

Security: SQL injection via crafted backup archive table/column names

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:397-402 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:284 📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:292
During restore, table names are extracted from tar entry paths (line 284) and metadata.json field names (line 233) — both sourced from the untrusted backup archive. These names flow into SQL statements via quoteIdentifier() (line 397-402), which wraps identifiers in backticks or double-quotes but does NOT escape those delimiter characters within the identifier itself. A crafted archive with a table name like foo DROP TABLE users; --` could inject arbitrary SQL.

quoteIdentifier is used in SELECT (line 314), TRUNCATE (line 336), and INSERT (line 350) statements. Column names from metadata.json are similarly unvalidated (line 292, used in INSERT at line 347).

While the attack requires someone to provide a malicious .tar.gz file to the restore command, this is a realistic threat for a backup/restore utility — especially if backups are downloaded from remote storage.

Edge Case: Backup reads tables without snapshot isolation

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java:119-127
The backup() method reads tables sequentially within a single jdbi.useHandle callback, but without setting a transaction isolation level (e.g., REPEATABLE READ or SERIALIZABLE). If the database is being written to during backup, the resulting archive may contain an inconsistent snapshot — e.g., a foreign key reference in table A pointing to a row in table B that wasn't captured because it was deleted between the two table reads.

Bug: versionToPackage produces collisions for different versions

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/MigrationTestRunner.java:167
The versionToPackage method simply concatenates the numeric parts of a version string without any separator or fixed-width formatting. This means different versions can map to the same package name, causing the wrong test class to be loaded (or no test to be found).

Examples of collisions:

  • "1.12.0""v1120" and "1.1.20""v1120"
  • "1.1.0""v110" and "1.10""v110"

If a future migration version happens to collide with an existing one, the test framework will silently run the wrong test class or fail to find the correct one.

Edge Case: AFTER tests still run when migration fails

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/MigrationTestRunner.java:113
When a migration fails (lines 106-111), the migrationFailed flag is set but the code only checks testCase != null before running validateAfter (line 113). The else if (!migrationFailed) branch at line 115 only applies to the "no tests" placeholder — if a testCase exists, its validateAfter will still execute against a partially-migrated database before the break at line 119. This could produce confusing test results or exceptions from the after-validation running against an inconsistent schema.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds operational utilities to OpenMetadata’s Java service to support full database backup/restore into a .tar.gz archive and introduces a migration testing runner that restores a backup and executes pending migrations with optional per-version before/after assertions.

Changes:

  • Add DatabaseBackupRestore utility (JDBI-based) and wire new backup, restore, and test-migration CLI commands into OpenMetadataOperations.
  • Add MigrationTestRunner plus a small migration test API (MigrationTestCase, TestResult) and expose migrations list via a getter on MigrationWorkflow.
  • Add unit tests covering JDBC DB-name extraction, identifier quoting, metadata reading, and migration version → package mapping.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pom.xml Adds commons-compress version property used for archive support.
openmetadata-service/pom.xml Adds Apache Commons Compress dependency for tar/gzip backup archives.
openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java Adds backup, restore, test-migration CLI commands.
openmetadata-service/src/main/java/org/openmetadata/service/util/DatabaseBackupRestore.java Implements dynamic table/column discovery, tar.gz backup, and restore logic for MySQL/Postgres.
openmetadata-service/src/main/java/org/openmetadata/service/util/MigrationTestRunner.java Implements restore+stepwise migration execution with before/after validation and summary output.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationWorkflow.java Exposes migrations via Lombok @Getter to support the runner.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/MigrationTestCase.java Defines the before/after validation interface for migration tests.
openmetadata-service/src/main/java/org/openmetadata/service/migration/api/TestResult.java Adds a small result record with pass/fail helpers for assertions.
openmetadata-service/src/test/java/org/openmetadata/service/util/DatabaseBackupRestoreTest.java Unit tests for DB-name extraction, identifier quoting, and metadata.json reading.
openmetadata-service/src/test/java/org/openmetadata/service/util/MigrationTestRunnerTest.java Unit tests for versionToPackage parsing behavior.

"SELECT column_name FROM information_schema.columns "
+ "WHERE table_schema = current_schema() AND table_name = :table "
+ "AND (is_generated = 'NEVER' OR is_generated IS NULL) "
+ "AND (column_default NOT LIKE 'nextval%' OR column_default IS NULL) "
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Postgres, discoverColumns filters out columns whose column_default starts with nextval (sequence-backed/serial columns). That means those columns (e.g., openmetadata_settings.id) are omitted from the backup metadata and will not be restored, which breaks “full backup/restore” semantics and can also break FK relationships if any table references those IDs. Consider including these columns in backups and, after restore, resetting sequences to max(col) (or setval) so future inserts keep working.

Suggested change
+ "AND (column_default NOT LIKE 'nextval%' OR column_default IS NULL) "

Copilot uses AI. Check for mistakes.
Comment on lines +556 to +561
private void disableForeignKeyChecks(Handle handle) {
if (connectionType == ConnectionType.MYSQL) {
handle.execute("SET FOREIGN_KEY_CHECKS = 0");
} else {
handle.execute("SET session_replication_role = 'replica'");
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Postgres, SET session_replication_role = 'replica' generally requires superuser privileges. If OpenMetadata runs with a non-superuser DB user (common in managed Postgres), restore will fail before any data is loaded. Consider an alternative approach that works without superuser (e.g., restoring in FK-safe order, using deferrable constraints/SET CONSTRAINTS ALL DEFERRED where applicable, or documenting/enforcing the required privileges explicitly).

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +203
} finally {
commitTransaction(handle);
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup path wraps table reads in a transaction but always calls COMMIT in a finally block. If any SQL error occurs (especially on Postgres where the transaction becomes aborted), COMMIT will fail and may mask the original error; it also prevents a clean rollback. Consider committing only on success and issuing a ROLLBACK when an exception occurs (or use JDBI transaction helpers like useTransaction).

Suggested change
} finally {
commitTransaction(handle);
commitTransaction(handle);
} catch (Exception e) {
handle.rollback();
throw e;

Copilot uses AI. Check for mistakes.
Comment on lines +2699 to +2701
description =
"Number of rows per batch during restore. Default: "
+ DatabaseBackupRestore.DEFAULT_BATCH_SIZE)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --batch-size option hardcodes defaultValue = "1000" but the description references DatabaseBackupRestore.DEFAULT_BATCH_SIZE. If the constant ever changes, CLI behavior and help text will diverge. Consider defining a single source of truth (e.g., keep the help text literal, or introduce a shared constant for both).

Suggested change
description =
"Number of rows per batch during restore. Default: "
+ DatabaseBackupRestore.DEFAULT_BATCH_SIZE)
description = "Number of rows per batch during restore. Default: 1000")

Copilot uses AI. Check for mistakes.
Comment on lines +425 to +457
TarArchiveEntry entry;
while ((entry = tais.getNextEntry()) != null) {
String name = entry.getName();
if (!name.startsWith("tables/") || !name.endsWith(".json")) {
continue;
}

String tableName = name.substring("tables/".length(), name.length() - ".json".length());

if (!validTables.contains(tableName)) {
LOG.warn("Table {} from archive not in metadata, skipping", tableName);
continue;
}

JsonNode tableMetaNode = tablesMetadata.get(tableName);
if (tableMetaNode == null) {
LOG.warn("No metadata found for table {}, skipping", tableName);
continue;
}

List<String> columns = new ArrayList<>();
tableMetaNode.get("columns").forEach(col -> columns.add(col.asText()));

Set<String> binaryColumns = new HashSet<>();
JsonNode binaryColumnsNode = tableMetaNode.get("binaryColumns");
if (binaryColumnsNode != null) {
binaryColumnsNode.forEach(col -> binaryColumns.add(col.asText()));
}

LOG.info("Restoring table {}", tableName);
int rowCount = insertRowsStreaming(handle, tableName, columns, binaryColumns, tais);
LOG.info("Restored table {} ({} rows)", tableName, rowCount);
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restoreTablesFromArchive restores only tables that have a corresponding tables/<name>.json entry in the archive, but it does not validate that every table listed in metadata.json was actually present/restored. A truncated/corrupted backup could therefore “restore successfully” with missing tables/data. Consider tracking restored table names and failing if any tables from tablesMetadata are missing in the archive.

Copilot uses AI. Check for mistakes.
Comment on lines +484 to +490
.fieldNames()
.forEachRemaining(
tableName -> {
String sql = String.format("TRUNCATE TABLE %s", quoteIdentifier(tableName));
handle.execute(sql);
LOG.info("Truncated table {}", tableName);
});
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncateAllTables runs TRUNCATE TABLE <table> per table. On Postgres this can fail when there are FK dependencies between tables (TRUNCATE enforces FK reference rules unless you use CASCADE or truncate all related tables in one statement). Consider using TRUNCATE ... CASCADE for Postgres (or truncating all tables in a single TRUNCATE statement) to make --force reliable.

Copilot uses AI. Check for mistakes.
Comment on lines +292 to +335
int offset = 0;
while (true) {
String sql =
String.format(
"SELECT %s FROM %s%s LIMIT %d OFFSET %d",
quotedColumns, quotedTable, orderByClause, batchSize, offset);
List<Map<String, Object>> rows = handle.createQuery(sql).mapToMap().list();

for (Map<String, Object> row : rows) {
gen.writeStartObject();
for (String col : columns) {
Object val = row.get(col);
if (val == null) {
gen.writeNullField(col);
} else if (val instanceof Number number) {
if (number instanceof Long l) {
gen.writeNumberField(col, l);
} else if (number instanceof Integer i) {
gen.writeNumberField(col, i);
} else if (number instanceof Double d) {
gen.writeNumberField(col, d);
} else if (number instanceof Float f) {
gen.writeNumberField(col, f);
} else if (number instanceof BigDecimal bd) {
gen.writeNumberField(col, bd);
} else {
gen.writeNumberField(col, number.longValue());
}
} else if (val instanceof Boolean b) {
gen.writeBooleanField(col, b);
} else if (val instanceof byte[] bytes) {
gen.writeBinaryField(col, bytes);
} else {
gen.writeStringField(col, val.toString());
}
}
gen.writeEndObject();
rowCount++;
}

if (rows.size() < batchSize) {
break;
}
offset += batchSize;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeTableToTempFile paginates with LIMIT ... OFFSET ... while increasing offset by batchSize. For large tables this becomes progressively slower (and can be very slow on Postgres/MySQL) because the database must scan/skip offset rows each time. Consider keyset pagination using the primary key(s) (or a stable cursor) to page efficiently, especially since you already discover PK columns.

Suggested change
int offset = 0;
while (true) {
String sql =
String.format(
"SELECT %s FROM %s%s LIMIT %d OFFSET %d",
quotedColumns, quotedTable, orderByClause, batchSize, offset);
List<Map<String, Object>> rows = handle.createQuery(sql).mapToMap().list();
for (Map<String, Object> row : rows) {
gen.writeStartObject();
for (String col : columns) {
Object val = row.get(col);
if (val == null) {
gen.writeNullField(col);
} else if (val instanceof Number number) {
if (number instanceof Long l) {
gen.writeNumberField(col, l);
} else if (number instanceof Integer i) {
gen.writeNumberField(col, i);
} else if (number instanceof Double d) {
gen.writeNumberField(col, d);
} else if (number instanceof Float f) {
gen.writeNumberField(col, f);
} else if (number instanceof BigDecimal bd) {
gen.writeNumberField(col, bd);
} else {
gen.writeNumberField(col, number.longValue());
}
} else if (val instanceof Boolean b) {
gen.writeBooleanField(col, b);
} else if (val instanceof byte[] bytes) {
gen.writeBinaryField(col, bytes);
} else {
gen.writeStringField(col, val.toString());
}
}
gen.writeEndObject();
rowCount++;
}
if (rows.size() < batchSize) {
break;
}
offset += batchSize;
String sql = String.format("SELECT %s FROM %s%s", quotedColumns, quotedTable, orderByClause);
Iterable<Map<String, Object>> rows =
handle.createQuery(sql).setFetchSize(batchSize).mapToMap();
for (Map<String, Object> row : rows) {
gen.writeStartObject();
for (String col : columns) {
Object val = row.get(col);
if (val == null) {
gen.writeNullField(col);
} else if (val instanceof Number number) {
if (number instanceof Long l) {
gen.writeNumberField(col, l);
} else if (number instanceof Integer i) {
gen.writeNumberField(col, i);
} else if (number instanceof Double d) {
gen.writeNumberField(col, d);
} else if (number instanceof Float f) {
gen.writeNumberField(col, f);
} else if (number instanceof BigDecimal bd) {
gen.writeNumberField(col, bd);
} else {
gen.writeNumberField(col, number.longValue());
}
} else if (val instanceof Boolean b) {
gen.writeBooleanField(col, b);
} else if (val instanceof byte[] bytes) {
gen.writeBinaryField(col, bytes);
} else {
gen.writeStringField(col, val.toString());
}
}
gen.writeEndObject();
rowCount++;

Copilot uses AI. Check for mistakes.
Comment on lines +2623 to +2625
description =
"Number of rows to read/write per batch. Default: "
+ DatabaseBackupRestore.DEFAULT_BATCH_SIZE)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --batch-size option hardcodes defaultValue = "1000" but the description references DatabaseBackupRestore.DEFAULT_BATCH_SIZE. If the constant ever changes, CLI behavior and help text will diverge. Consider defining a single source of truth (e.g., keep the help text literal, or introduce a shared constant for both).

Suggested change
description =
"Number of rows to read/write per batch. Default: "
+ DatabaseBackupRestore.DEFAULT_BATCH_SIZE)
description = "Number of rows to read/write per batch. Default: 1000")

Copilot uses AI. Check for mistakes.
Comment on lines +2657 to +2662
names = {"--batch-size"},
defaultValue = "1000",
description =
"Number of rows to insert per batch. Default: "
+ DatabaseBackupRestore.DEFAULT_BATCH_SIZE)
int batchSize) {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --batch-size option hardcodes defaultValue = "1000" but the description references DatabaseBackupRestore.DEFAULT_BATCH_SIZE. If the constant ever changes, CLI behavior and help text will diverge. Consider defining a single source of truth (e.g., keep the help text literal, or introduce a shared constant for both).

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +172
return null;
} catch (Exception e) {
LOG.warn("Failed to instantiate test class {}", className, e);
return null;
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a migration test class exists but fails to instantiate (constructor throws, linkage error, etc.), loadTestCase logs a warning and returns null, which later gets reported as “(no tests)” instead of a failing test. Consider surfacing this as a failed entry in the summary so broken test implementations are not silently ignored.

Suggested change
return null;
} catch (Exception e) {
LOG.warn("Failed to instantiate test class {}", className, e);
return null;
// No migration test defined for this version; treat as "no tests"
return null;
} catch (Exception e) {
String message = String.format("Failed to instantiate migration test class %s", className);
LOG.warn(message, e);
throw new RuntimeException(message, e);

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants